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codes to a general format applicable to a variety of such commercial architectures. 

(2) Examine the feasibility of using workstation networks for such distributed 
computation: this involves (a) developing timing models of the communication 
systems of such networks (b) projecting performance of the above codes on such 
networks, and (c) implementing one or more codes, as time permits. 

(3) Initiate research on CFD-based low-radar crossection analysis on parallel 
systems: this effort is in association with Dr. Joseph Shang at WRDC. 

















AfOSR-TB- 94 0 2 9 6 


Final Technical Report 


Approved for public release 

for Period 10-15-89 to 1-14-94 distribution unlimited. 


Grant AF-AFQSR-90-0020 


MASSIVELY-PARALLEL COMPUTATIONAL FLUID DYNAMICS 


D. A. Calahan 


Principal Investigator 

Department of Electrical Engineering & Computer Science 
University of Michigan 










[. Goals 


The effort has three major components. 

(1) Gain algorithm experience in conversion of a suite of Air Force production CFD codes 
to a general format applicable to a variety of such commercial architectures. 

(2) Examine the feasibility of using workstation networks for such distributed computation; 
this involves (a) developing timing models of the communication systems of such networks, (b) 
projecting performance of the above codes on such networks, and (c) implementing one or more 
codes, as time permits. 

(3) Initiate research on CFD-based low-radar crossection analysis on parallel systems; this 
effort is in association with Dr. Joseph Shang at WRDC. 


II. Progress Report 


1989-199QEBBgC55 

The following were grant-sponsored accomplishments. 

(1) Implicit algorithm development . A full 3-D Navier-Stokes Beam-Warming CFD code 
was implemented on a 1024-node scalar NCUBE hypercube at SANDLA (Albuquerque). 

(2) Distributed-workstation architectures for CFD . The University is providing a cluster 
of IBM workstations for distributed algorithm study. Critical timing features of such a system are 
being measured to insert into the overall timing models associated with the completed explicit and 
implicit AEFDL Navier Stokes codes; their parallel performance will then be predicted 

(3) Generic distributed parallel codes . Commercial operating systems are available 
which permit algorithm coding toward a distributed parallel environment that includes most current 
MIMD systems. The EXPRESS system from Parasoft has been adopted and the completed 
explicit and implicit AFFDL Navier Stokes codes are being adapted to this software environment. 

(4) Crossection analysis . Recently-proposed CFD-related numerical procedures by Dr. 
Shang on new methods of solving Maxwell’s equations in real time are being examined for their 
solvability on parallel systems. This effort will begin in earnest in summer 1991. 


1980-1991 progress 

The following were grant-sponsored efforts. 

(1) Distributed-workstation architectures for CFD. At the time of this grant 
initiation, the only computing resources in Dr. Shang's group with potentially scalable, parallel 
features were a collection of graphic workstations. We decided, after completion of the above 
implicit code conversion, to examine the feasibility of using workstation networks for such 
distributed computation; this involves (a) developing timing models of the communication systems 
of such networks, (b) projecting performance of the above codes on such networks, and (c) 
implementing one or more codes, as time permits. 

The University had provided a cluster of IBM workstations for distributed algorithm study. 
Unfortunately, it was found that, as message-passing architectures, the models provided had long 
latency times even when connected in a local network and, in the general network environment 
provided at the University, these latencies would wither any but an embarrassingly parallel 





algorithm. It was clear that further research with locally-available network technology would be 
have no value in demonstrating to Dr. Shang the usefulness of networking his workstations. Also, 
an effort by the University to implement the EXPRESS distributed system on these IBM units was 
unsuccessful. For these reasons, this research effort was abandoned. 

(21 CFD-based low-radar crossection analysis on parallel systems. Recently-proposed 
CFD-related numerical procedures by Dr. Shang on new methods of solving Maxwell's equations 
in real time were examined for their solvability on parallel systems. Dr. Shang forwarded a 2-D 
code for study. It was found that the algorithm kernel involved a forward-substitution process, 
which, with little computational complexity, was deemed inappropriate for message-passing 
architectures. Some investigation was made of the CM-2 because of its NEWS high speed 
interconnect. However, it was later felt by Dr. Shang that the sample algorithm was not extendable 
to general 3-D problems,and work was suspended awaiting a new sample serial code. 

(3) In June, Dr. Shang informed us that he had received a start-up effort to exploit the 
DELTA in his work, and that he wished our assistance in the work. Most students familiar with 
these codes had left our project, so it was decided that 

(a) his staff would, beginning with the above-mentioned NCUBE explicit code, carry out a 
conversion to the DELTA of a newer explicit code; 

(b) we would evaluate the feasibility of converting his implicit code from the NCUBE 1 to 
the DELTA. 

Regarding the latter, it was understood that the syntactic conversion would be trivial. However, 
the NCUBE algorithm was convened from the serial form specifically to minimize the number of 
hops in a hypercube interconnect. The price paid was a significant increase in the number of 
messages in the NCUBE version. The relatively low message latency in the NCUBE hardware 
had resulted in a 60% parallelization efficiency. It was obvious that the DELTA would be 
relatively more affected by latency, and an initial evaluation has led us to look elsewhere for 
efficient implicit kernels which could be interfaced with the FDL code. We visited the Parallel 
Systems Division at NASA/ARC for discussions with a researcher engaged in similar activities on 
the INTEL GAMMA. By October 15, the end of this reporting period, we had not obtained access 
to the DELTA, but we had performed some preliminary generic NCUBE-INTEL conversion on the 
Argonne GAMMA. 

We obtained access to the DELTA in late November and attempted to implement our simpler 
NCUBE explicit code on the DELTA. We have encountered a number of system problems, as 
well as I/O programming issues due to the large available local memories on the DELTA, in 
contrast to the NCUBE. 

In summary, we foresee a number of programming and algorithmic issues to achieve a state-of-the- 
art implementation of the FDL implicit code on the DELTA, and we are now evaluating which 
would be appropriate for our grant to study or import from ARC. 


^is was the version of the NCUBE with 512K bytes/node; it was not the NCUBE2. 






The following were grant-sponsored efforts. 


(1) CED-hased Computational Electromagnets (CEM) on parallel systems . 
Recently-proposed CFD-related numerical procedures by Dr. Shang on new methods of solving 
Maxwell's equations in real time were examined for their solvability on parallel systems. In 
previous years of grant effort, Dr. Shang forwarded 2-D and 3-D explicit CEM codes for study. 
These were found by Dr. Shang to have numerical problems and were put aside before 
parallelization. 

In the summer of 1992, Dr. Shang forwarded a new suite of three CEM codes for parallelization. 
An attempt to port these to a recently-purchased KSR at the University was put aside when it 
became clear that the level of KSR compiler support would not permit efficient parallelization. It 
was agreed with Dr. Shang that remaining effort 2 should be spent on the DELTA, which had 
achieved a reasonable level of hardware stability and compiler efficiency. Experience on the KSR 
was useful in giving insight, however. In the process of preparing a code to exploit the KSR’s 
automatic parallelization ("tiling"), a version of the code was developed which could be readily 
converted to a message-passing machine like the DELTA. 

As a result, a two-step algorithm- and code-development procedure was developed. In step (1) 
Professor Calahan carry out most parallelization on a reliable uniprocessor mainframe with familiar 
and sophisticated debugging tools; the appropriate DELTA message-passing libraries were 
emulated where necessary. In step (2), this code was converted to the DELTA, principally a 
syntactic step, involving Dr. Shang's CEM staff at WRDC. When the grant terminates, these 
application researchers will then be able to carry on independently. Student assistants at the 
University are also involved in this final parallelization step. It is expected that these three codes 
will be completely parallized by 3/31/93 l . A paper abstract on this topic, joint with WRDC, has 
been submitted [2]. 

(2) Distributed CFD implicit code . Based on experience with the above-mentionedtwo- 
step process, it was felt reasonable to re-institute a project to parallelize a prototype implicit CFD 
code for the DELTA; a previous parallelization for the NCUBE [1] was deemed inappropriate due 
to the relatively long message startup of the DELTA. This project had languished due to inability 
of finding a student sufficiently experienced to carry out the somewhat involved parallelization. It 
is now felt that the above two-step parallelization process involving Professor Calahan in the 
emulation step will make parallelization possible with modest student and WRDC help in the final 
parallelization step. Again, WRDC involvement will have an important educational value. 

We now have in hand the most recent implicit N-S production code from WRDC. Successful 
parallelization will permit DELTA or PARAGON solution within the 3-year period of a DARPA 
contract with the WRDC CFD group. 


12S2--1994 progress 


In joint work with Dr. Shang at WRDC, the the Fall of 1992 two serial CEM codes were 
restructured in generic serial formats suitable for easy implementation on distributed parallel 
architectures; also, performance projections were made based on knowledge of the DELTA 


2 The gram was scheduled to terminate on 10/14/92. A 1-year no-cost extension has 
been approved. 






architecture. A total of six generic programs were developed, depending on the number geometric 
directions to be partitioned (i.e., there were l-D, 2-D, and 3-D versions of each code) t was later 
decided that the numerical characteristics of one code (the implicit) were nor suitable, so effort was 
continued only on one code. The l-D and 2-D versions of this explicit code were then parallelized 
on the DELTA, and performance data reported in [2]. Another CEM code was then received from 
Dr. Shang in mid-summer and its parallelization reported in [3]. 


III. Coupling Activities 


1989-1990 

Air Force Flight Dynamics Laboratory 

The implicit code parallelized by Kominsky (above) was a production C LP o btained from Dr. 
Joseph Shang, director of the Computational Aerodynamics Group at AFFDL. One visit and 
monthly contacts were made to his laboratory. This completed a study initiated in a previous 
AFOSR grant to develop distributed parallel versions of principal production CFD codes in 
Shang's group. 


mkim 

Air Force FlighiDynainies. Laboratory 

The implicit code parallelized by Kominsky (above) was a production CFD obtained from Dr. 
Joseph Shang, director of the Computational Aerodynamics Group at AFFDL. Monthly contacts 
were made to his laboratory in regard to conversion of the NCUBE explicit code to the DELTA. 

129L1222 

Ai r F orcg.ElightDyn am ici Laboratory 

Bi-monthly visits are made to WRDC to discuss the above-memtioned CEM and CFD codes. 


1892-19 94 

Air Eorcs-flighL Dynamics Laboratory 

A number of visits were made to WRDC to discuss parallelization of CEM codes. 

Phillips Laboratory. Kinland AFB 


A visit was made to determine the extent to which the interests and experience of the PI might relate 
to their research in parallel computation. 
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